Contents

import warnings
warnings.filterwarnings('ignore')
pip install mlflow
Requirement already satisfied: mlflow in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (2.19.0)
Requirement already satisfied: mlflow-skinny==2.19.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow) (2.19.0)
Requirement already satisfied: Flask<4 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow) (3.1.0)
Requirement already satisfied: Jinja2<4,>=3.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow) (3.1.5)
Requirement already satisfied: alembic!=1.10.0,<2 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow) (1.14.0)
Requirement already satisfied: docker<8,>=4.0.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow) (7.1.0)
Requirement already satisfied: graphene<4 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow) (3.4.3)
Requirement already satisfied: markdown<4,>=3.3 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow) (3.7)
Requirement already satisfied: matplotlib<4 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow) (3.10.0)
Requirement already satisfied: numpy<3 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow) (2.2.0)
Requirement already satisfied: pandas<3 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow) (2.2.3)
Requirement already satisfied: pyarrow<19,>=4.0.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow) (18.1.0)
Requirement already satisfied: scikit-learn<2 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow) (1.6.0)
Requirement already satisfied: scipy<2 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow) (1.14.1)
Requirement already satisfied: sqlalchemy<3,>=1.4.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow) (2.0.36)
Requirement already satisfied: waitress<4 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow) (3.0.2)
Requirement already satisfied: cachetools<6,>=5.0.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow-skinny==2.19.0->mlflow) (5.5.0)
Requirement already satisfied: click<9,>=7.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow-skinny==2.19.0->mlflow) (8.1.8)
Requirement already satisfied: cloudpickle<4 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow-skinny==2.19.0->mlflow) (3.1.0)
Requirement already satisfied: databricks-sdk<1,>=0.20.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow-skinny==2.19.0->mlflow) (0.40.0)
Requirement already satisfied: gitpython<4,>=3.1.9 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow-skinny==2.19.0->mlflow) (3.1.43)
Requirement already satisfied: importlib_metadata!=4.7.0,<9,>=3.7.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow-skinny==2.19.0->mlflow) (8.5.0)
Requirement already satisfied: opentelemetry-api<3,>=1.9.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow-skinny==2.19.0->mlflow) (1.29.0)
Requirement already satisfied: opentelemetry-sdk<3,>=1.9.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow-skinny==2.19.0->mlflow) (1.29.0)
Requirement already satisfied: packaging<25 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow-skinny==2.19.0->mlflow) (24.2)
Requirement already satisfied: protobuf<6,>=3.12.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow-skinny==2.19.0->mlflow) (5.29.2)
Requirement already satisfied: pyyaml<7,>=5.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow-skinny==2.19.0->mlflow) (6.0.2)
Requirement already satisfied: requests<3,>=2.17.3 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow-skinny==2.19.0->mlflow) (2.32.3)
Requirement already satisfied: sqlparse<1,>=0.4.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from mlflow-skinny==2.19.0->mlflow) (0.5.3)
Requirement already satisfied: Mako in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from alembic!=1.10.0,<2->mlflow) (1.3.8)
Requirement already satisfied: typing-extensions>=4 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from alembic!=1.10.0,<2->mlflow) (4.12.2)
Requirement already satisfied: pywin32>=304 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from docker<8,>=4.0.0->mlflow) (308)
Requirement already satisfied: urllib3>=1.26.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from docker<8,>=4.0.0->mlflow) (2.2.3)
Requirement already satisfied: Werkzeug>=3.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from Flask<4->mlflow) (3.1.3)
Requirement already satisfied: itsdangerous>=2.2 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from Flask<4->mlflow) (2.2.0)
Requirement already satisfied: blinker>=1.9 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from Flask<4->mlflow) (1.9.0)
Requirement already satisfied: graphql-core<3.3,>=3.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from graphene<4->mlflow) (3.2.5)
Requirement already satisfied: graphql-relay<3.3,>=3.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from graphene<4->mlflow) (3.2.0)
Requirement already satisfied: python-dateutil<3,>=2.7.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from graphene<4->mlflow) (2.9.0.post0)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from Jinja2<4,>=3.0->mlflow) (3.0.2)
Requirement already satisfied: contourpy>=1.0.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from matplotlib<4->mlflow) (1.3.1)
Requirement already satisfied: cycler>=0.10 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from matplotlib<4->mlflow) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from matplotlib<4->mlflow) (4.55.3)
Requirement already satisfied: kiwisolver>=1.3.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from matplotlib<4->mlflow) (1.4.7)
Requirement already satisfied: pillow>=8 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from matplotlib<4->mlflow) (11.0.0)
Requirement already satisfied: pyparsing>=2.3.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from matplotlib<4->mlflow) (3.2.0)
Requirement already satisfied: pytz>=2020.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from pandas<3->mlflow) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from pandas<3->mlflow) (2024.2)
Requirement already satisfied: joblib>=1.2.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from scikit-learn<2->mlflow) (1.4.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from scikit-learn<2->mlflow) (3.5.0)
Requirement already satisfied: greenlet!=0.4.17 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from sqlalchemy<3,>=1.4.0->mlflow) (3.1.1)
Requirement already satisfied: colorama in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from click<9,>=7.0->mlflow-skinny==2.19.0->mlflow) (0.4.6)
Requirement already satisfied: google-auth~=2.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from databricks-sdk<1,>=0.20.0->mlflow-skinny==2.19.0->mlflow) (2.37.0)
Requirement already satisfied: gitdb<5,>=4.0.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from gitpython<4,>=3.1.9->mlflow-skinny==2.19.0->mlflow) (4.0.11)
Requirement already satisfied: zipp>=3.20 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from importlib_metadata!=4.7.0,<9,>=3.7.0->mlflow-skinny==2.19.0->mlflow) (3.21.0)
Requirement already satisfied: deprecated>=1.2.6 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from opentelemetry-api<3,>=1.9.0->mlflow-skinny==2.19.0->mlflow) (1.2.15)
Requirement already satisfied: opentelemetry-semantic-conventions==0.50b0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from opentelemetry-sdk<3,>=1.9.0->mlflow-skinny==2.19.0->mlflow) (0.50b0)
Requirement already satisfied: six>=1.5 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from python-dateutil<3,>=2.7.0->graphene<4->mlflow) (1.17.0)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from requests<3,>=2.17.3->mlflow-skinny==2.19.0->mlflow) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from requests<3,>=2.17.3->mlflow-skinny==2.19.0->mlflow) (3.10)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from requests<3,>=2.17.3->mlflow-skinny==2.19.0->mlflow) (2024.12.14)
Requirement already satisfied: wrapt<2,>=1.10 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from deprecated>=1.2.6->opentelemetry-api<3,>=1.9.0->mlflow-skinny==2.19.0->mlflow) (1.17.0)
Requirement already satisfied: smmap<6,>=3.0.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from gitdb<5,>=4.0.1->gitpython<4,>=3.1.9->mlflow-skinny==2.19.0->mlflow) (5.0.1)
Requirement already satisfied: pyasn1-modules>=0.2.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from google-auth~=2.0->databricks-sdk<1,>=0.20.0->mlflow-skinny==2.19.0->mlflow) (0.4.1)
Requirement already satisfied: rsa<5,>=3.1.4 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from google-auth~=2.0->databricks-sdk<1,>=0.20.0->mlflow-skinny==2.19.0->mlflow) (4.9)
Requirement already satisfied: pyasn1<0.7.0,>=0.4.6 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from pyasn1-modules>=0.2.1->google-auth~=2.0->databricks-sdk<1,>=0.20.0->mlflow-skinny==2.19.0->mlflow) (0.6.1)
Note: you may need to restart the kernel to use updated packages.
pip install dagshub
Requirement already satisfied: dagshub in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (0.4.0)
Requirement already satisfied: PyYAML>=5 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub) (6.0.2)
Requirement already satisfied: appdirs>=1.4.4 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub) (1.4.4)
Requirement already satisfied: click>=8.0.4 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub) (8.1.8)
Requirement already satisfied: httpx>=0.23.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub) (0.28.1)
Requirement already satisfied: GitPython>=3.1.29 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub) (3.1.43)
Requirement already satisfied: rich>=13.1.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub) (13.9.4)
Requirement already satisfied: dacite~=1.6.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub) (1.6.0)
Requirement already satisfied: tenacity>=8.2.2 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub) (9.0.0)
Requirement already satisfied: gql[requests] in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub) (3.5.0)
Requirement already satisfied: dataclasses-json in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub) (0.6.7)
Requirement already satisfied: pandas in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub) (2.2.3)
Requirement already satisfied: treelib>=1.6.4 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub) (1.7.0)
Requirement already satisfied: pathvalidate>=3.0.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub) (3.2.1)
Requirement already satisfied: python-dateutil in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub) (2.9.0.post0)
Requirement already satisfied: boto3 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub) (1.35.86)
Requirement already satisfied: dagshub-annotation-converter>=0.1.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub) (0.1.2)
Requirement already satisfied: colorama in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from click>=8.0.4->dagshub) (0.4.6)
Requirement already satisfied: lxml in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub-annotation-converter>=0.1.0->dagshub) (5.3.0)
Requirement already satisfied: pillow in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub-annotation-converter>=0.1.0->dagshub) (11.0.0)
Requirement already satisfied: pydantic>=2.0.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub-annotation-converter>=0.1.0->dagshub) (2.10.4)
Requirement already satisfied: typing-extensions in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dagshub-annotation-converter>=0.1.0->dagshub) (4.12.2)
Requirement already satisfied: gitdb<5,>=4.0.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from GitPython>=3.1.29->dagshub) (4.0.11)
Requirement already satisfied: anyio in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from httpx>=0.23.0->dagshub) (4.7.0)
Requirement already satisfied: certifi in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from httpx>=0.23.0->dagshub) (2024.12.14)
Requirement already satisfied: httpcore==1.* in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from httpx>=0.23.0->dagshub) (1.0.7)
Requirement already satisfied: idna in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from httpx>=0.23.0->dagshub) (3.10)
Requirement already satisfied: h11<0.15,>=0.13 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from httpcore==1.*->httpx>=0.23.0->dagshub) (0.14.0)
Requirement already satisfied: markdown-it-py>=2.2.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from rich>=13.1.0->dagshub) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from rich>=13.1.0->dagshub) (2.18.0)
Requirement already satisfied: six in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from treelib>=1.6.4->dagshub) (1.17.0)
Requirement already satisfied: botocore<1.36.0,>=1.35.86 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from boto3->dagshub) (1.35.86)
Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from boto3->dagshub) (1.0.1)
Requirement already satisfied: s3transfer<0.11.0,>=0.10.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from boto3->dagshub) (0.10.4)
Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dataclasses-json->dagshub) (3.23.2)
Requirement already satisfied: typing-inspect<1,>=0.4.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dataclasses-json->dagshub) (0.9.0)
Requirement already satisfied: graphql-core<3.3,>=3.2 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from gql[requests]->dagshub) (3.2.5)
Requirement already satisfied: yarl<2.0,>=1.6 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from gql[requests]->dagshub) (1.18.3)
Requirement already satisfied: backoff<3.0,>=1.11.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from gql[requests]->dagshub) (2.2.1)
Requirement already satisfied: requests<3,>=2.26 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from gql[requests]->dagshub) (2.32.3)
Requirement already satisfied: requests-toolbelt<2,>=1.0.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from gql[requests]->dagshub) (1.0.0)
Requirement already satisfied: numpy>=1.23.2 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from pandas->dagshub) (2.2.0)
Requirement already satisfied: pytz>=2020.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from pandas->dagshub) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from pandas->dagshub) (2024.2)
Requirement already satisfied: sniffio>=1.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from anyio->httpx>=0.23.0->dagshub) (1.3.1)
Requirement already satisfied: urllib3!=2.2.0,<3,>=1.25.4 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from botocore<1.36.0,>=1.35.86->boto3->dagshub) (2.2.3)
Requirement already satisfied: smmap<6,>=3.0.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from gitdb<5,>=4.0.1->GitPython>=3.1.29->dagshub) (5.0.1)
Requirement already satisfied: mdurl~=0.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from markdown-it-py>=2.2.0->rich>=13.1.0->dagshub) (0.1.2)
Requirement already satisfied: packaging>=17.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from marshmallow<4.0.0,>=3.18.0->dataclasses-json->dagshub) (24.2)
Requirement already satisfied: annotated-types>=0.6.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from pydantic>=2.0.0->dagshub-annotation-converter>=0.1.0->dagshub) (0.7.0)
Requirement already satisfied: pydantic-core==2.27.2 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from pydantic>=2.0.0->dagshub-annotation-converter>=0.1.0->dagshub) (2.27.2)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from requests<3,>=2.26->gql[requests]->dagshub) (3.4.0)
Requirement already satisfied: mypy-extensions>=0.3.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from typing-inspect<1,>=0.4.0->dataclasses-json->dagshub) (1.0.0)
Requirement already satisfied: multidict>=4.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from yarl<2.0,>=1.6->gql[requests]->dagshub) (6.1.0)
Requirement already satisfied: propcache>=0.2.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from yarl<2.0,>=1.6->gql[requests]->dagshub) (0.2.1)
Note: you may need to restart the kernel to use updated packages.
pip install dvc
Requirement already satisfied: dvc in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (3.58.0)Note: you may need to restart the kernel to use updated packages.

Requirement already satisfied: attrs>=22.2.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (24.3.0)
Requirement already satisfied: celery in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (5.4.0)
Requirement already satisfied: colorama>=0.3.9 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (0.4.6)
Requirement already satisfied: configobj>=5.0.9 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (5.0.9)
Requirement already satisfied: distro>=1.3 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (1.9.0)
Requirement already satisfied: dpath<3,>=2.1.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (2.2.0)
Requirement already satisfied: dulwich in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (0.22.7)
Requirement already satisfied: dvc-data<3.17,>=3.16.2 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (3.16.7)
Requirement already satisfied: dvc-http>=2.29.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (2.32.0)
Requirement already satisfied: dvc-objects in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (5.1.0)
Requirement already satisfied: dvc-render<2,>=1.0.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (1.0.2)
Requirement already satisfied: dvc-studio-client<1,>=0.21 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (0.21.0)
Requirement already satisfied: dvc-task<1,>=0.3.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (0.40.2)
Requirement already satisfied: flatten_dict<1,>=0.4.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (0.4.2)
Requirement already satisfied: flufl.lock<9,>=8.1.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (8.1.0)
Requirement already satisfied: fsspec>=2024.2.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (2024.12.0)
Requirement already satisfied: funcy>=1.14 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (2.0)
Requirement already satisfied: grandalf<1,>=0.7 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (0.8)
Requirement already satisfied: gto<2,>=1.6.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (1.7.2)
Requirement already satisfied: hydra-core>=1.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (1.3.2)
Requirement already satisfied: iterative-telemetry>=0.0.7 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (0.0.9)
Requirement already satisfied: kombu in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (5.4.2)
Requirement already satisfied: networkx>=2.5 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (3.4.2)
Requirement already satisfied: omegaconf in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (2.3.0)
Requirement already satisfied: packaging>=19 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (24.2)
Requirement already satisfied: pathspec>=0.10.3 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (0.12.1)
Requirement already satisfied: platformdirs<5,>=3.1.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (4.3.6)
Requirement already satisfied: psutil>=5.8 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (6.1.1)
Requirement already satisfied: pydot>=1.2.4 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (3.0.3)
Requirement already satisfied: pygtrie>=2.3.2 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (2.5.0)
Requirement already satisfied: pyparsing>=2.4.7 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (3.2.0)
Requirement already satisfied: requests>=2.22 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (2.32.3)
Requirement already satisfied: rich>=12 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (13.9.4)
Requirement already satisfied: ruamel.yaml>=0.17.11 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (0.18.6)
Requirement already satisfied: scmrepo<4,>=3.3.8 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (3.3.9)
Requirement already satisfied: shortuuid>=0.5 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (1.0.13)
Requirement already satisfied: shtab<2,>=1.3.4 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (1.7.1)
Requirement already satisfied: tabulate>=0.8.7 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (0.9.0)
Requirement already satisfied: tomlkit>=0.11.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (0.13.2)
Requirement already satisfied: tqdm<5,>=4.63.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (4.67.1)
Requirement already satisfied: voluptuous>=0.11.7 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (0.15.2)
Requirement already satisfied: zc.lockfile>=1.2.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc) (3.0.post1)
Requirement already satisfied: dictdiffer>=0.8.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc-data<3.17,>=3.16.2->dvc) (0.9.0)
Requirement already satisfied: diskcache>=5.2.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc-data<3.17,>=3.16.2->dvc) (5.6.3)
Requirement already satisfied: sqltrie<1,>=0.11.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc-data<3.17,>=3.16.2->dvc) (0.11.1)
Requirement already satisfied: orjson<4,>=3 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc-data<3.17,>=3.16.2->dvc) (3.10.12)
Requirement already satisfied: aiohttp-retry>=2.5.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc-http>=2.29.0->dvc) (2.9.1)
Requirement already satisfied: pywin32>=225 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from dvc-task<1,>=0.3.0->dvc) (308)
Requirement already satisfied: billiard<5.0,>=4.2.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from celery->dvc) (4.2.1)
Requirement already satisfied: vine<6.0,>=5.1.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from celery->dvc) (5.1.0)
Requirement already satisfied: click<9.0,>=8.1.2 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from celery->dvc) (8.1.8)
Requirement already satisfied: click-didyoumean>=0.3.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from celery->dvc) (0.3.1)
Requirement already satisfied: click-repl>=0.2.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from celery->dvc) (0.3.0)
Requirement already satisfied: click-plugins>=1.1.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from celery->dvc) (1.1.1)
Requirement already satisfied: tzdata>=2022.7 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from celery->dvc) (2024.2)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from celery->dvc) (2.9.0.post0)
Requirement already satisfied: six<2.0,>=1.12 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from flatten_dict<1,>=0.4.1->dvc) (1.17.0)
Requirement already satisfied: atpublic in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from flufl.lock<9,>=8.1.0->dvc) (5.0)
Requirement already satisfied: entrypoints in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from gto<2,>=1.6.0->dvc) (0.4)
Requirement already satisfied: pydantic!=2.0.0,<3,>=1.9.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from gto<2,>=1.6.0->dvc) (2.10.4)
Requirement already satisfied: semver>=2.13.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from gto<2,>=1.6.0->dvc) (3.0.2)
Requirement already satisfied: typer>=0.4.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from gto<2,>=1.6.0->dvc) (0.15.1)
Requirement already satisfied: antlr4-python3-runtime==4.9.* in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from hydra-core>=1.1->dvc) (4.9.3)
Requirement already satisfied: appdirs in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from iterative-telemetry>=0.0.7->dvc) (1.4.4)
Requirement already satisfied: filelock in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from iterative-telemetry>=0.0.7->dvc) (3.16.1)
Requirement already satisfied: amqp<6.0.0,>=5.1.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from kombu->dvc) (5.3.1)
Requirement already satisfied: PyYAML>=5.1.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from omegaconf->dvc) (6.0.2)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from requests>=2.22->dvc) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from requests>=2.22->dvc) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from requests>=2.22->dvc) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from requests>=2.22->dvc) (2024.12.14)
Requirement already satisfied: markdown-it-py>=2.2.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from rich>=12->dvc) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from rich>=12->dvc) (2.18.0)
Requirement already satisfied: ruamel.yaml.clib>=0.2.7 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from ruamel.yaml>=0.17.11->dvc) (0.2.12)
Requirement already satisfied: gitpython>3 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from scmrepo<4,>=3.3.8->dvc) (3.1.43)
Requirement already satisfied: pygit2>=1.14.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from scmrepo<4,>=3.3.8->dvc) (1.16.0)
Requirement already satisfied: asyncssh<3,>=2.13.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from scmrepo<4,>=3.3.8->dvc) (2.19.0)
Requirement already satisfied: setuptools in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from zc.lockfile>=1.2.1->dvc) (75.6.0)
Requirement already satisfied: aiohttp in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (3.11.11)
Requirement already satisfied: cryptography>=39.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from asyncssh<3,>=2.13.1->scmrepo<4,>=3.3.8->dvc) (44.0.0)
Requirement already satisfied: typing_extensions>=4.0.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from asyncssh<3,>=2.13.1->scmrepo<4,>=3.3.8->dvc) (4.12.2)
Requirement already satisfied: prompt-toolkit>=3.0.36 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from click-repl>=0.2.0->celery->dvc) (3.0.48)
Requirement already satisfied: gitdb<5,>=4.0.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from gitpython>3->scmrepo<4,>=3.3.8->dvc) (4.0.11)
Requirement already satisfied: mdurl~=0.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from markdown-it-py>=2.2.0->rich>=12->dvc) (0.1.2)
Requirement already satisfied: annotated-types>=0.6.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from pydantic!=2.0.0,<3,>=1.9.0->gto<2,>=1.6.0->dvc) (0.7.0)
Requirement already satisfied: pydantic-core==2.27.2 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from pydantic!=2.0.0,<3,>=1.9.0->gto<2,>=1.6.0->dvc) (2.27.2)
Requirement already satisfied: cffi>=1.17.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from pygit2>=1.14.0->scmrepo<4,>=3.3.8->dvc) (1.17.1)
Requirement already satisfied: shellingham>=1.3.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from typer>=0.4.1->gto<2,>=1.6.0->dvc) (1.5.4)
Requirement already satisfied: aiohappyeyeballs>=2.3.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (2.4.4)
Requirement already satisfied: aiosignal>=1.1.2 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (1.3.2)
Requirement already satisfied: frozenlist>=1.1.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (1.5.0)
Requirement already satisfied: multidict<7.0,>=4.5 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (6.1.0)
Requirement already satisfied: propcache>=0.2.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (0.2.1)
Requirement already satisfied: yarl<2.0,>=1.17.0 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from aiohttp->aiohttp-retry>=2.5.0->dvc-http>=2.29.0->dvc) (1.18.3)
Requirement already satisfied: pycparser in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from cffi>=1.17.0->pygit2>=1.14.0->scmrepo<4,>=3.3.8->dvc) (2.22)
Requirement already satisfied: smmap<6,>=3.0.1 in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from gitdb<5,>=4.0.1->gitpython>3->scmrepo<4,>=3.3.8->dvc) (5.0.1)
Requirement already satisfied: wcwidth in c:\users\pranj\onedrive\desktop\pycharmprojects\datascience\venv\lib\site-packages (from prompt-toolkit>=3.0.36->click-repl>=0.2.0->celery->dvc) (0.2.13)
import dagshub
dagshub.init(repo_owner='pranjalm04', repo_name='Machine_learning_project', mlflow=True)
Accessing as kundan7kar
Repository Machine_learning_project doesn't exist, creating it under current user.
Response (422):
b'{"message":""}'
---------------------------------------------------------------------------
RepoNotFoundError                         Traceback (most recent call last)
File ~\OneDrive\Desktop\PycharmProjects\Datascience\venv\Lib\site-packages\dagshub\common\init.py:88, in init(repo_name, repo_owner, url, root, host, mlflow, dvc, patch_mlflow)
     87 try:
---> 88     repo_api.get_repo_info()
     89 except RepoNotFoundError:

File ~\OneDrive\Desktop\PycharmProjects\Datascience\venv\Lib\site-packages\dagshub\common\api\repo.py:83, in RepoAPI.get_repo_info(self)
     82 if res.status_code == 404:
---> 83     raise RepoNotFoundError(f"Repo {self.repo_url} not found")
     84 elif res.status_code >= 400:

RepoNotFoundError: Repo https://dagshub.com/pranjalm04/Machine_learning_project not found

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
Cell In[5], line 2
      1 import dagshub
----> 2 dagshub.init(repo_owner='pranjalm04', repo_name='Machine_learning_project', mlflow=True)

File ~\OneDrive\Desktop\PycharmProjects\Datascience\venv\Lib\site-packages\dagshub\common\init.py:91, in init(repo_name, repo_owner, url, root, host, mlflow, dvc, patch_mlflow)
     89 except RepoNotFoundError:
     90     log_message(f"Repository {repo_name} doesn't exist, creating it under current user.")
---> 91     create_repo(repo_name)
     93 # Get the token for the configs
     94 token = get_token(host=host)

File ~\OneDrive\Desktop\PycharmProjects\Datascience\venv\Lib\site-packages\dagshub\upload\wrapper.py:165, in create_repo(repo_name, org_name, description, private, auto_init, gitignores, license, readme, template, host)
    163 logger.error(f"Response ({res.status_code}):\n" f"{res.content}")
    164 if res.status_code == HTTPStatus.UNPROCESSABLE_ENTITY:
--> 165     raise RuntimeError("Repository name is invalid or it already exists.")
    166 else:
    167     raise RuntimeError("Failed to create the desired repository.")

RuntimeError: Repository name is invalid or it already exists.
pip install plotly
Requirement already satisfied: plotly in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (5.24.1)
Requirement already satisfied: tenacity>=6.2.0 in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (from plotly) (9.0.0)
Requirement already satisfied: packaging in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (from plotly) (24.2)
Note: you may need to restart the kernel to use updated packages.
[notice] A new release of pip available: 22.3 -> 24.3.1
[notice] To update, run: C:\Users\pranj\AppData\Local\Programs\Python\Python311\python.exe -m pip install --upgrade pip
# Importing all necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, roc_auc_score, confusion_matrix, RocCurveDisplay, classification_report, ConfusionMatrixDisplay, roc_curve
from sklearn.utils import resample
loan_applications = pd.read_csv('loan_applications.csv')
credit_features = pd.read_csv('credit_features_subset.csv')
# data_dictionary = pd.read_csv('/kaggle/input/loan-application-risk-prediction-data/loan_data_dictionary.csv')
print(loan_applications.head())
loan_applications.describe(include='all')
credit_features.describe(include='all')
loan_applications.isnull().sum()
credit_features.isnull().sum()
       UID ApplicationDate  Amount  Term        EmploymentType  \
0  4921736      03/07/2020    2000    60  Employed - full time   
1  1241981      04/02/2020    3000    60  Employed - full time   
2  5751748      02/08/2020   20000    60  Employed - full time   
3  7163425      23/09/2020   20000    60         Self employed   
4   227377      01/01/2020    5000    36  Employed - full time   

              LoanPurpose  Success  
0        Unexpected bills        0  
1  Starting new bussniess        0  
2        Business capital        0  
3    New business venture        0  
4                     car        0  
UID                                     0
ALL_AgeOfOldestAccount                  0
ALL_AgeOfYoungestAccount                0
ALL_Count                               0
ALL_CountActive                         0
ALL_CountClosedLast12Months             0
ALL_CountDefaultAccounts                0
ALL_CountOpenedLast12Months             0
ALL_CountSettled                        0
ALL_MeanAccountAge                      0
ALL_SumCurrentOutstandingBal            0
ALL_SumCurrentOutstandingBalExcMtg      0
ALL_TimeSinceMostRecentDefault          0
ALL_WorstPaymentStatusActiveAccounts    0
dtype: int64
from ydata_profiling import ProfileReport
dataset = pd.merge(loan_applications, credit_features, on='UID', how='inner')
dataset.head()
profile = ProfileReport(dataset, title="Exploratory Data Analysis Report", explorative=True)
profile.to_notebook_iframe()
#EXPERIMENT 3
data_cleaned = dataset.drop(columns=['ALL_TimeSinceMostRecentDefault','ApplicationDate'])
data_cleaned.columns
loan_purpose_counts = data_cleaned['LoanPurpose'].value_counts()
rare_categories = loan_purpose_counts[loan_purpose_counts < 50].index
data_cleaned['LoanPurpose'] = data_cleaned['LoanPurpose'].replace(rare_categories, 'Other')
print(data_cleaned.head())

# Feature combining
data_cleaned['DebtRatio'] = (
    data_cleaned['ALL_SumCurrentOutstandingBal'] / (data_cleaned['Amount'] + 1e-6)
)
data_cleaned['AccountAgeRatio'] = (
    data_cleaned['ALL_AgeOfYoungestAccount'] / (data_cleaned['ALL_AgeOfOldestAccount'] + 1e-6)
)
import mlflow

# Start an MLflow run
with mlflow.start_run(run_name='Experiment 3') as run:
    # Log parameters for the transformations
    mlflow.log_param("columns_dropped", "ALL_TimeSinceMostRecentDefault")
    mlflow.log_param("encoding_columns", "EmploymentType, LoanPurpose")
    mlflow.log_param("log_transform_columns", "Amount, ALL_SumCurrentOutstandingBal")
    
    # Log transformed DataFrame as an artifact
    data_cleaned.to_csv("data_cleaned.csv", index=False)
    mlflow.log_artifact("data_cleaned.csv")
    
    # Log metrics, e.g., number of features and description stats
    mlflow.log_metric("num_features", data_cleaned.shape[1])
    description = data_cleaned[['Amount', 'ALL_SumCurrentOutstandingBal','AccountAgeRatio','DebtRatio']].describe()
    print(description)
    for column in description.columns: 
        for stat in description.index: 
            mlflow.log_metric(f"{column}_{stat.strip('%')}", description.loc[stat, column])
    
    # Save the run ID
    run_id = run.info.run_id

print(f"Logged to MLflow with run_id {run_id}")
       UID  Amount  Term        EmploymentType LoanPurpose  Success  \
0  4921736    2000    60  Employed - full time       Other        0   
1  1241981    3000    60  Employed - full time       Other        0   
2  5751748   20000    60  Employed - full time       Other        0   
3  7163425   20000    60         Self employed       Other        0   
4   227377    5000    36  Employed - full time         car        0   

   ALL_AgeOfOldestAccount  ALL_AgeOfYoungestAccount  ALL_Count  \
0                     162                        17         18   
1                     266                        30         14   
2                      90                        52          4   
3                     163                        19         14   
4                     129                         2         38   

   ALL_CountActive  ALL_CountClosedLast12Months  ALL_CountDefaultAccounts  \
0               12                            0                         0   
1               10                            0                         4   
2                2                            0                         1   
3                6                            1                         1   
4               19                            4                         9   

   ALL_CountOpenedLast12Months  ALL_CountSettled  ALL_MeanAccountAge  \
0                            0                 6               70.94   
1                            0                 4              104.79   
2                            0                 2               68.25   
3                            0                 8               67.50   
4                            8                19               56.45   

   ALL_SumCurrentOutstandingBal  ALL_SumCurrentOutstandingBalExcMtg  \
0                         68555                               15019   
1                          2209                                2209   
2                          5108                                5108   
3                         25738                               25738   
4                          5801                                5801   

   ALL_WorstPaymentStatusActiveAccounts  
0                                     0  
1                                     7  
2                                     7  
3                                     0  
4                                     7  
             Amount  ALL_SumCurrentOutstandingBal  AccountAgeRatio  \
count   8847.000000                  8.847000e+03      8847.000000   
mean    7560.692438                  5.416258e+04         0.089602   
std     5309.138911                  1.074432e+05         0.172504   
min      500.000000                 -1.000000e+00         0.000000   
25%     3000.000000                  3.396000e+03         0.015267   
50%     6500.000000                  1.267000e+04         0.034884   
75%    10000.000000                  6.171250e+04         0.080460   
max    20000.000000                  4.004808e+06         1.000001   

         DebtRatio  
count  8847.000000  
mean      9.174910  
std      20.774177  
min      -0.001667  
25%       0.736333  
50%       2.422462  
75%       9.698741  
max     363.504667  
๐Ÿƒ View run Experiment 3 at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0/runs/aed4d9b083c24af3be4fbeeb55491f77
๐Ÿงช View experiment at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0
Logged to MLflow with run_id aed4d9b083c24af3be4fbeeb55491f77
#EXPERIMENT 1
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline, FeatureUnion 
from sklearn.compose import ColumnTransformer 
from sklearn.preprocessing import StandardScaler, MinMaxScaler, FunctionTransformer, OneHotEncoder 
from sklearn.linear_model import LogisticRegression 
from sklearn.model_selection import cross_val_score, train_test_split,GridSearchCV, StratifiedKFold 
from sklearn.metrics import f1_score, confusion_matrix
class LogTransform(BaseEstimator, TransformerMixin):
    def __init__(self):
        pass

    def fit(self, X, y=None):
        # No fitting needed for this transformer
        return self

    def transform(self, X):
      X=X.copy()
      X['Amount'] = np.log1p(X['Amount']) 
      X['ALL_SumCurrentOutstandingBal'] = np.log1p(X['ALL_SumCurrentOutstandingBal'])
      return X
class NegativeDataTransform(BaseEstimator, TransformerMixin):
    def __init__(self):
        pass

    def fit(self, X, y=None):
        # No fitting needed for this transformer
        return self

    def transform(self, X):
        X = X.copy()
        columns_with_negatives = X.columns[(X == -1).any()]
        X[columns_with_negatives] = X[columns_with_negatives].replace(-1, 0)
        X['ALL_SumCurrentOutstandingBal'] = X['ALL_SumCurrentOutstandingBal'].replace([np.inf, -np.inf], 0)
        return X
class CategoryEncode(BaseEstimator, TransformerMixin):
    def __init__(self):
        pass

    def fit(self, X, y=None):
        # No fitting needed for this transformer
        return self

    def transform(self, X):
        X = X.copy()
        X = pd.get_dummies(X, columns=['LoanPurpose'], drop_first=True)
        X = pd.get_dummies(X, columns=['EmploymentType'], drop_first=True)
        return X

X = data_cleaned.drop(columns=['Success'])  # Features
y = data_cleaned['Success']  # Target

numeric_features = X.select_dtypes(include=['int64', 'float64']).columns 
categorical_features = X.select_dtypes(include=['object']).columns 
print(data_cleaned.columns)
numeric_transformer = Pipeline(steps=[
                                      ('log', LogTransform()),\
                                      ('negative_replace',NegativeDataTransform()),\
                                      ('scaler', StandardScaler()),\
                                     ]) 
category_transformer= Pipeline(steps=[('category_encode', CategoryEncode())]) 
preprocessor = ColumnTransformer( transformers=[ ('numerical_features', numeric_transformer, numeric_features), ('categorical_features', category_transformer, categorical_features) ]) # Create final pipeline with Logistic Regression 
pipeline = Pipeline(steps=[('preprocessor', preprocessor), ('classifier', LogisticRegression(solver='liblinear'))])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Cross-validation with 3/10 folds 

print(len(X_test),len(y_test))
param_grid={
    'classifier__C':[0.01,0.1,1,10],
    'classifier__penalty':['l1','l2']
}
cv=StratifiedKFold(n_splits=10)
grid_search=GridSearchCV(pipeline,param_grid,cv=cv,scoring='f1',n_jobs=-1)
grid_search.fit(X_train,y_train)
best_model=grid_search.best_estimator_
best_params=grid_search.best_params_
y_pred=best_model.predict(X_test)
f1 = f1_score(y_test, y_pred) 
tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()
print(f1,tn,fp,fn,tp)
with mlflow.start_run(run_name='Experiment 1') as run:
    # Log parameters for the transformations
    mlflow.log_metric('f1_score',f1)
    mlflow.log_metric('true_negatives',tn)
    mlflow.log_metric('falsepositives',fp)
    mlflow.log_metric('falsenegatives',fn)
    mlflow.log_metric('truepositives',tp)
    mlflow.log_params(best_params)
    mlflow.sklearn.log_model(best_model,"model")
    # Save the run ID
    run_id = run.info.run_id

log_reg_accuracy = accuracy_score(y_test, y_pred)
log_reg_precision, log_reg_recall, log_reg_fscore, _ = precision_recall_fscore_support(y_test, y_pred, average='binary')
log_reg_roc_auc = roc_auc_score(y_test, y_pred)
log_reg_cm = confusion_matrix(y_test, y_pred)

# Print Metrics
print("Logistic Regression Metrics:")
print(f"Accuracy: {log_reg_accuracy:.2f}")
print(f"Precision: {log_reg_precision:.2f}")
print(f"Recall: {log_reg_recall:.2f}")
print(f"F1 Score: {log_reg_fscore:.2f}")
print(f"ROC AUC: {log_reg_roc_auc:.2f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred, target_names=['Rejected (0)', 'Approved (1)']))
print(f"Logged to MLflow with run_id {run_id}")
Index(['UID', 'Amount', 'Term', 'EmploymentType', 'LoanPurpose', 'Success',
       'ALL_AgeOfOldestAccount', 'ALL_AgeOfYoungestAccount', 'ALL_Count',
       'ALL_CountActive', 'ALL_CountClosedLast12Months',
       'ALL_CountDefaultAccounts', 'ALL_CountOpenedLast12Months',
       'ALL_CountSettled', 'ALL_MeanAccountAge',
       'ALL_SumCurrentOutstandingBal', 'ALL_SumCurrentOutstandingBalExcMtg',
       'ALL_WorstPaymentStatusActiveAccounts', 'DebtRatio', 'AccountAgeRatio'],
      dtype='object')
1770 1770
0.06349206349206349 1587 10 167 6
2024/12/21 15:57:28 WARNING mlflow.models.model: Model logged without a signature and input example. Please set `input_example` parameter when logging the model to auto infer the model signature.
๐Ÿƒ View run Experiment 1 at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0/runs/1826cd2678494631b155b1b35f23dfa1
๐Ÿงช View experiment at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0
Logistic Regression Metrics:
Accuracy: 0.90
Precision: 0.38
Recall: 0.03
F1 Score: 0.06
ROC AUC: 0.51

Classification Report:
               precision    recall  f1-score   support

Rejected (0)       0.90      0.99      0.95      1597
Approved (1)       0.38      0.03      0.06       173

    accuracy                           0.90      1770
   macro avg       0.64      0.51      0.51      1770
weighted avg       0.85      0.90      0.86      1770

Logged to MLflow with run_id 1826cd2678494631b155b1b35f23dfa1
pip install xgboost
Requirement already satisfied: xgboost in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (2.1.3)
Requirement already satisfied: numpy in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (from xgboost) (2.0.2)
Requirement already satisfied: scipy in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (from xgboost) (1.13.1)
Note: you may need to restart the kernel to use updated packages.
[notice] A new release of pip available: 22.3 -> 24.3.1
[notice] To update, run: C:\Users\pranj\AppData\Local\Programs\Python\Python311\python.exe -m pip install --upgrade pip
#FEATURE SELECTION EXPERIMENT 4
from xgboost import XGBClassifier
from sklearn.feature_selection import VarianceThreshold




def FeatureSelection(data_cleaned,y):
    #CORRELATION 
    corr_matrix = df.corr() # Display the correlation matrix 
    
    threshold = 0.7 
    corr_pairs = corr_matrix.abs().unstack().sort_values(kind="quicksort").drop_duplicates() 
    high_corr_pairs = corr_pairs[(corr_pairs > threshold) & (corr_pairs < 1)] 
     # Select features to retain 
    features_to_drop = set() 
    for (feature1, feature2) in high_corr_pairs.index: 
        features_to_drop.add(feature2) 
    print("Features to drop:", features_to_drop) # Drop the identified features 
    df_selected_corr = df.drop(columns=list(features_to_drop)) # Display the selected features DataFrame print("Selected features:") 
    print(df_selected_corr.columns)
    
    selector = VarianceThreshold(threshold=(.8 * (1 - .8))) 
    X_var = selector.fit_transform(df_selected_corr)
    selected_features_var = df_selected_corr.columns[selector.get_support()]
    
    #FEATURE SELECTION USING XGBClassifier feature importance
    
    xgb_model = XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=42)
    xgb_model.fit(X_var, y)
    feature_importances = xgb_model.feature_importances_
    
    important_features = pd.DataFrame({
        'Feature': selected_features_var.to_list(),
        'Importance': feature_importances
    }).sort_values(by='Importance', ascending=False)
    
    
    # Selecting top features with importance greater than a threshold (e.g., > 0.008)
    selected_features = important_features[important_features['Importance'] > 0.008]['Feature']
    important_features_sorted = important_features.sort_values(by='Importance', ascending=False)
    top_important_features = important_features_sorted.head(10)
    top_important_features_list = top_important_features['Feature'].tolist()
    
    
    #Variance Threshold
    
    
    with mlflow.start_run(run_name='Experiment 4') as run:
        # Log parameters for the transformations
        Corr_features=','.join(df_selected_corr.columns)
        mlflow.log_param('Features selected using Correlation',Corr_features)
        mlflow.log_param('Features selected using VarianceThreshold',','.join(selected_features_var))
        mlflow.log_param('Features selected using XGboostClassifier',','.join(top_important_features_list))
        # Save the run ID
        run_id = run.info.run_id
    return top_important_features
    
X = data_cleaned.drop(columns=['Success'])  # Features
y = data_cleaned['Success']  # Target
X_labels = pd.get_dummies(X, columns=['LoanPurpose','EmploymentType'], drop_first=True).columns
# Training XGBoost model on the entire dataset for feature importance
preprocessor.fit(X)
transformed_data=preprocessor.transform(X)
df=pd.DataFrame(transformed_data,columns=X_labels)


FeatureSelection(data_cleaned,y)
Features to drop: {'ALL_Count', 'ALL_MeanAccountAge', 'ALL_CountClosedLast12Months'}
Index(['UID', 'Amount', 'Term', 'ALL_AgeOfOldestAccount',
       'ALL_AgeOfYoungestAccount', 'ALL_CountActive',
       'ALL_CountDefaultAccounts', 'ALL_CountOpenedLast12Months',
       'ALL_CountSettled', 'ALL_SumCurrentOutstandingBal',
       'ALL_SumCurrentOutstandingBalExcMtg',
       'ALL_WorstPaymentStatusActiveAccounts', 'DebtRatio', 'AccountAgeRatio',
       'LoanPurpose_Car repairs', 'LoanPurpose_Consolidation',
       'LoanPurpose_Debt', 'LoanPurpose_Debt Consolidation',
       'LoanPurpose_Debt consolidation', 'LoanPurpose_Furniture',
       'LoanPurpose_Home Improvements', 'LoanPurpose_Home Improvments',
       'LoanPurpose_Home improvement', 'LoanPurpose_Home improvements',
       'LoanPurpose_New car', 'LoanPurpose_Other',
       'LoanPurpose_Short Term Loan', 'LoanPurpose_car',
       'LoanPurpose_consolidation', 'LoanPurpose_debt consolidation',
       'LoanPurpose_home improvement', 'LoanPurpose_home improvements',
       'LoanPurpose_other', 'EmploymentType_Employed - part time',
       'EmploymentType_Retired', 'EmploymentType_Self employed'],
      dtype='object')
๐Ÿƒ View run Experiment 4 at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0/runs/1ea98b0cdc5c44fa88ac40d69e4390b4
๐Ÿงช View experiment at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0
Feature Importance
11 ALL_WorstPaymentStatusActiveAccounts 0.389812
1 Amount 0.097353
7 ALL_CountOpenedLast12Months 0.063172
6 ALL_CountDefaultAccounts 0.052642
10 ALL_SumCurrentOutstandingBalExcMtg 0.050647
3 ALL_AgeOfOldestAccount 0.046745
5 ALL_CountActive 0.045500
9 ALL_SumCurrentOutstandingBal 0.041750
12 DebtRatio 0.034382
2 Term 0.032692
pip install --upgrade scikit-learn
Requirement already satisfied: scikit-learn in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (1.6.0)
Requirement already satisfied: numpy>=1.19.5 in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (from scikit-learn) (2.0.2)
Requirement already satisfied: scipy>=1.6.0 in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (from scikit-learn) (1.13.1)
Requirement already satisfied: joblib>=1.2.0 in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (from scikit-learn) (1.4.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (from scikit-learn) (3.5.0)
Note: you may need to restart the kernel to use updated packages.
[notice] A new release of pip available: 22.3 -> 24.3.1
[notice] To update, run: C:\Users\pranj\AppData\Local\Programs\Python\Python311\python.exe -m pip install --upgrade pip
#EXPERIMENT 2

import seaborn as sns
from sklearn.linear_model import RidgeClassifier
def confusionMatrix(cm,label):
    labels = ['success', 'rejected']  
    # Create a confusion matrix plot
    plt.figure(figsize=(6, 4))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=labels, yticklabels=labels)
    plt.xlabel('Predicted Labels')
    plt.ylabel('True Labels')
    plt.title('Confusion Matrix')
    confusion_matrix_path = f"confusion_matrix{label}.png"
    plt.savefig(confusion_matrix_path)
    plt.close()
    return confusion_matrix_path
def Training(X,y):
    models = {
        'Logistic Regression': LogisticRegression(random_state=42),
        'Random Forest': RandomForestClassifier(random_state=42),
        'Ridge Classifier': RidgeClassifier(alpha=1.0)
    }
    
    metrics = {}
    
    with mlflow.start_run(run_name="Experiment 2") as run:
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 
        for model_name, model in models.items():
            # Pipeline
           
            pipeline = Pipeline(steps=[('preprocessor', preprocessor), ('classifier', model)])
            pipeline.fit(X_train, y_train)
    
            preds = pipeline.predict(X_test)
    
            accuracy = accuracy_score(y_test, preds)
            precision, recall, fscore, _ = precision_recall_fscore_support(y_test, preds, average='binary')
            roc_auc = roc_auc_score(y_test, preds)
            cm = confusion_matrix(y_test, preds)
            
            confusion_matrix_path = confusionMatrix(cm, model_name.lower().replace(' ', '_'))
            mlflow.sklearn.log_model(model, model_name)
            mlflow.log_metric(f"{model_name}_accuracy", accuracy)
            mlflow.log_metric(f"{model_name}_precision", precision)
            mlflow.log_metric(f"{model_name}_recall", recall)
            mlflow.log_metric(f"{model_name}_f1_score", fscore)
            mlflow.log_metric(f"{model_name}_roc_auc", roc_auc)
    
            # Log the confusion matrix artifact
            mlflow.log_artifact(confusion_matrix_path, artifact_path=f"confusion_matrices/{model_name.lower().replace(' ', '_')}")
    
            # Store metrics in dictionary for display
            metrics[model_name] = {
                'accuracy': accuracy,
                'precision': precision,
                'recall': recall,
                'f1_score': fscore,
                'roc_auc': roc_auc
            }
    for model_name, model_metrics in metrics.items():
        print(f"\n{model_name} Metrics:")
        for metric_name, metric_value in model_metrics.items():
            print(f"{metric_name.capitalize()}: {metric_value:.2f}")
Training(X,y)
2024/12/21 15:57:45 WARNING mlflow.models.model: Model logged without a signature and input example. Please set `input_example` parameter when logging the model to auto infer the model signature.
2024/12/21 15:57:55 WARNING mlflow.models.model: Model logged without a signature and input example. Please set `input_example` parameter when logging the model to auto infer the model signature.
2024/12/21 15:58:13 WARNING mlflow.models.model: Model logged without a signature and input example. Please set `input_example` parameter when logging the model to auto infer the model signature.
๐Ÿƒ View run Experiment 2 at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0/runs/2ad1d80a65d842aa88838496c90e3bd4
๐Ÿงช View experiment at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0

Logistic Regression Metrics:
Accuracy: 0.90
Precision: 0.33
Recall: 0.03
F1_score: 0.05
Roc_auc: 0.51

Random Forest Metrics:
Accuracy: 0.91
Precision: 0.61
Recall: 0.11
F1_score: 0.19
Roc_auc: 0.55

Ridge Classifier Metrics:
Accuracy: 0.90
Precision: 0.00
Recall: 0.00
F1_score: 0.00
Roc_auc: 0.50
#Experiment 5
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score
import mlflow


pca = PCA(n_components=0.95) 
pipeline = Pipeline(steps=[('preprocessor', preprocessor), ('pca', pca)])
X_train_pca = pipeline.fit_transform(X_train)
X_test_pca = pipeline.transform(X_test)

resulting_dimensions = X_train_pca.shape[1]

# Log the results to MLflow
with mlflow.start_run(run_name="Experiment 5"):
    mlflow.log_param("pca_n_components", resulting_dimensions)
    mlflow.log_metric("explained_variance_ratio", np.sum(pca.explained_variance_ratio_))

    # Log each explained variance ratio for each component
    for i, variance in enumerate(pca.explained_variance_ratio_):
        mlflow.log_metric(f"explained_variance_ratio_pc{i+1}", variance)

# Print the results
print(f"Number of components retained after PCA: {resulting_dimensions}")
print(f"Explained variance ratio for each principal component: {pca.explained_variance_ratio_}")
๐Ÿƒ View run Experiment 5 at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0/runs/b3171bb27fa148be889f248aad241c4e
๐Ÿงช View experiment at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0
Number of components retained after PCA: 15
Explained variance ratio for each principal component: [0.23234183 0.16038198 0.10809867 0.0852459  0.07553159 0.05754008
 0.05501017 0.03838504 0.03152731 0.02780045 0.02515221 0.01776631
 0.01626259 0.01438121 0.01344823]
pip install imblearn
Requirement already satisfied: imblearn in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (0.0)
Requirement already satisfied: imbalanced-learn in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (from imblearn) (0.13.0)
Requirement already satisfied: numpy<3,>=1.24.3 in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (from imbalanced-learn->imblearn) (2.0.2)
Requirement already satisfied: scipy<2,>=1.10.1 in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (from imbalanced-learn->imblearn) (1.13.1)
Requirement already satisfied: scikit-learn<2,>=1.3.2 in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (from imbalanced-learn->imblearn) (1.6.0)
Requirement already satisfied: sklearn-compat<1,>=0.1 in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (from imbalanced-learn->imblearn) (0.1.3)
Requirement already satisfied: joblib<2,>=1.1.1 in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (from imbalanced-learn->imblearn) (1.4.2)
Requirement already satisfied: threadpoolctl<4,>=2.0.0 in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (from imbalanced-learn->imblearn) (3.5.0)
Note: you may need to restart the kernel to use updated packages.
[notice] A new release of pip available: 22.3 -> 24.3.1
[notice] To update, run: C:\Users\pranj\AppData\Local\Programs\Python\Python311\python.exe -m pip install --upgrade pip
#Experiment 6
#There was an imbalance in the data between success and rejection cases, thus it requires removing imbalances to avoid the bias using SMOTE
from imblearn.over_sampling import SMOTE
smote = SMOTE(random_state=42)
preprocessor.fit(X)
transformed_data=preprocessor.transform(X)
X_balanced, y_balanced = smote.fit_resample(transformed_data, y)
print(X_balanced)
before_balancing = pd.DataFrame({'Class': ['Rejected (0)', 'Approved (1)'], 'Count': y.value_counts()})
after_balancing = pd.DataFrame({'Class': ['Rejected (0)', 'Approved (1)'], 'Count': pd.Series(y_balanced).value_counts()})
fig_before = px.bar(
    before_balancing,
    x='Class',
    y='Count',
    title="Target Variable Distribution Before Balancing",
    text='Count',
    color='Class',
    color_discrete_map={'Rejected (0)': 'lightcoral', 'Approved (1)': 'lightgreen'},
    template='plotly_white'
)
fig_before.show()

# Plotting after balancing
fig_after = px.bar(
    after_balancing,
    x='Class',
    y='Count',
    title="Target Variable Distribution After Balancing (SMOTE)",
    text='Count',
    color='Class',
    color_discrete_map={'Rejected (0)': 'lightcoral', 'Approved (1)': 'lightgreen'},
    template='plotly_white'
)
fig_after.show()
with mlflow.start_run(run_name="Experiment 6") as run:
    # Plot before balancing
    plt.figure(figsize=(12, 6))
    plt.subplot(1, 2, 1)
    plt.bar(before_balancing['Class'], before_balancing['Count'], color='lightblue')
    plt.title('Class Distribution Before Balancing')
    plt.xlabel('Class')
    plt.ylabel('Count')

    # Plot after balancing
    plt.subplot(1, 2, 2)
    plt.bar(after_balancing['Class'], after_balancing['Count'], color='lightgreen')
    plt.title('Class Distribution After Balancing')
    plt.xlabel('Class')
    plt.ylabel('Count')

    # Save the plots as PNG files
    before_balancing_plot_path = "before_balancing_plot.png"
    after_balancing_plot_path = "after_balancing_plot.png"
    
    plt.tight_layout()
    plt.savefig(before_balancing_plot_path)  # Save before balancing plot
    plt.savefig(after_balancing_plot_path)   # Save after balancing plot
    plt.close()

    # Log the plots as artifacts
    mlflow.log_artifact(before_balancing_plot_path, artifact_path="plots")
    mlflow.log_artifact(after_balancing_plot_path, artifact_path="plots")

    # Optionally, you can print the artifact URIs or other info
    print(f"Artifacts saved to: {run.info.artifact_uri}")
[[ 0.06096534 -1.13687812  1.11628666 ...  0.          0.
   0.        ]
 [-1.26935068 -0.68096416  1.11628666 ...  0.          0.
   0.        ]
 [ 0.36103376  1.45275783  1.11628666 ...  0.          0.
   0.        ]
 ...
 [ 0.63816262  0.0561608   0.37271191 ...  0.          0.
   0.        ]
 [-0.67739244  0.36342622  1.11628666 ...  0.          0.
   0.        ]
 [ 0.71471446 -1.91602654 -1.1144376  ...  0.7026994   0.
   0.        ]]
Artifacts saved to: mlflow-artifacts:/751772a584f746138d510c970dd2b5c2/08efe1ae2a58450d80f158e367996a1e/artifacts
๐Ÿƒ View run Experiment 6 at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0/runs/08efe1ae2a58450d80f158e367996a1e
๐Ÿงช View experiment at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0
#Experiment 7
# compare models with after smote for better results

preprocessor.fit(X)
transformed_data=preprocessor.transform(X)
df=pd.DataFrame(X_balanced,columns=X_labels)

selected_features=FeatureSelection(df,y_balanced)
df_train=df[selected_features['Feature'].to_list()]
print(df_train.columns)

models = {
    'Logistic Regression': LogisticRegression(random_state=42),
    'Random Forest': RandomForestClassifier(random_state=42),
    'Ridge Classifier': RidgeClassifier(alpha=1.0)
}

metrics = {}

with mlflow.start_run(run_name="Experiment 7") as run:
    X_train, X_test, y_train, y_test = train_test_split(df_train, y_balanced, test_size=0.2, random_state=42) 
    for model_name, model in models.items():
        # Pipeline
       
        pipeline = Pipeline(steps=[('classifier', model)])
        pipeline.fit(X_train, y_train)

        preds = pipeline.predict(X_test)

        accuracy = accuracy_score(y_test, preds)
        precision, recall, fscore, _ = precision_recall_fscore_support(y_test, preds, average='binary')
        roc_auc = roc_auc_score(y_test, preds)
        cm = confusion_matrix(y_test, preds)
        
        confusion_matrix_path = confusionMatrix(cm, model_name.lower().replace(' ', '_'))
        mlflow.sklearn.log_model(model, model_name)
        mlflow.log_metric(f"{model_name}_accuracy", accuracy)
        mlflow.log_metric(f"{model_name}_precision", precision)
        mlflow.log_metric(f"{model_name}_recall", recall)
        mlflow.log_metric(f"{model_name}_f1_score", fscore)
        mlflow.log_metric(f"{model_name}_roc_auc", roc_auc)

        # Log the confusion matrix artifact
        mlflow.log_artifact(confusion_matrix_path, artifact_path=f"confusion_matrices/{model_name.lower().replace(' ', '_')}")

        # Store metrics in dictionary for display
        metrics[model_name] = {
            'accuracy': accuracy,
            'precision': precision,
            'recall': recall,
            'f1_score': fscore,
            'roc_auc': roc_auc
        }
for model_name, model_metrics in metrics.items():
    print(f"\n{model_name} Metrics:")
    for metric_name, metric_value in model_metrics.items():
        print(f"{metric_name.capitalize()}: {metric_value:.2f}")
Features to drop: {'ALL_Count', 'ALL_MeanAccountAge', 'ALL_CountDefaultAccounts'}
Index(['UID', 'Amount', 'Term', 'ALL_AgeOfOldestAccount',
       'ALL_AgeOfYoungestAccount', 'ALL_CountActive',
       'ALL_CountClosedLast12Months', 'ALL_CountOpenedLast12Months',
       'ALL_CountSettled', 'ALL_SumCurrentOutstandingBal',
       'ALL_SumCurrentOutstandingBalExcMtg',
       'ALL_WorstPaymentStatusActiveAccounts', 'DebtRatio', 'AccountAgeRatio',
       'LoanPurpose_Car repairs', 'LoanPurpose_Consolidation',
       'LoanPurpose_Debt', 'LoanPurpose_Debt Consolidation',
       'LoanPurpose_Debt consolidation', 'LoanPurpose_Furniture',
       'LoanPurpose_Home Improvements', 'LoanPurpose_Home Improvments',
       'LoanPurpose_Home improvement', 'LoanPurpose_Home improvements',
       'LoanPurpose_New car', 'LoanPurpose_Other',
       'LoanPurpose_Short Term Loan', 'LoanPurpose_car',
       'LoanPurpose_consolidation', 'LoanPurpose_debt consolidation',
       'LoanPurpose_home improvement', 'LoanPurpose_home improvements',
       'LoanPurpose_other', 'EmploymentType_Employed - part time',
       'EmploymentType_Retired', 'EmploymentType_Self employed'],
      dtype='object')
๐Ÿƒ View run Experiment 4 at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0/runs/19baeabbe6da4425a92cc57d7927580a
๐Ÿงช View experiment at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0
Index(['ALL_WorstPaymentStatusActiveAccounts', 'Amount', 'LoanPurpose_Other',
       'ALL_CountClosedLast12Months', 'ALL_CountOpenedLast12Months', 'Term',
       'ALL_AgeOfYoungestAccount', 'ALL_CountActive',
       'ALL_SumCurrentOutstandingBalExcMtg', 'ALL_AgeOfOldestAccount'],
      dtype='object')
2024/12/21 15:58:57 WARNING mlflow.models.model: Model logged without a signature and input example. Please set `input_example` parameter when logging the model to auto infer the model signature.
2024/12/21 15:59:15 WARNING mlflow.models.model: Model logged without a signature and input example. Please set `input_example` parameter when logging the model to auto infer the model signature.
2024/12/21 15:59:33 WARNING mlflow.models.model: Model logged without a signature and input example. Please set `input_example` parameter when logging the model to auto infer the model signature.
๐Ÿƒ View run Experiment 7 at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0/runs/7ab55cd3551c4cdf8bc47664135de9b3
๐Ÿงช View experiment at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0

Logistic Regression Metrics:
Accuracy: 0.74
Precision: 0.71
Recall: 0.83
F1_score: 0.76
Roc_auc: 0.75

Random Forest Metrics:
Accuracy: 0.94
Precision: 0.94
Recall: 0.93
F1_score: 0.94
Roc_auc: 0.94

Ridge Classifier Metrics:
Accuracy: 0.74
Precision: 0.70
Recall: 0.86
F1_score: 0.77
Roc_auc: 0.74
pip install joblib
Requirement already satisfied: joblib in c:\users\pranj\appdata\local\programs\python\python311\lib\site-packages (1.4.2)
Note: you may need to restart the kernel to use updated packages.
[notice] A new release of pip available: 22.3 -> 24.3.1
[notice] To update, run: C:\Users\pranj\AppData\Local\Programs\Python\Python311\python.exe -m pip install --upgrade pip
import joblib

preprocessor.fit(X)
transformed_data=preprocessor.transform(X)
df=pd.DataFrame(X_balanced,columns=X_labels)

selected_features=FeatureSelection(df,y_balanced)
df_train=df[selected_features['Feature'].to_list()]
print(df_train.columns)

models = {
    'Logistic Regression': LogisticRegression(random_state=42),
    'Random Forest': RandomForestClassifier(random_state=42),
    'Ridge Classifier': RidgeClassifier(alpha=1.0)
}
X_train, X_test, y_train, y_test = train_test_split(df_train, y_balanced, test_size=0.2, random_state=42) 
metrics = {}
for model_name, model in models.items():
    with mlflow.start_run(run_name=f"Experiment {model_name}") as run:
        pipeline = Pipeline(steps=[('classifier', model)])
        pipeline.fit(X_train, y_train)

        preds = pipeline.predict(X_test)

        accuracy = accuracy_score(y_test, preds)
        precision, recall, fscore, _ = precision_recall_fscore_support(y_test, preds, average='binary')
        roc_auc = roc_auc_score(y_test, preds)
        cm = confusion_matrix(y_test, preds)
        
        confusion_matrix_path = confusionMatrix(cm, model_name.lower().replace(' ', '_'))
        mlflow.sklearn.log_model(model, model_name)
        mlflow.log_metric(f"accuracy", accuracy)
        mlflow.log_metric(f"precision", precision)
        mlflow.log_metric(f"recall", recall)
        mlflow.log_metric(f"f1_score", fscore)
        mlflow.log_metric(f"roc_auc", roc_auc)

        # Log the confusion matrix artifact
        mlflow.log_artifact(confusion_matrix_path, artifact_path=f"confusion_matrices/{model_name.lower().replace(' ', '_')}")

        # Store metrics in dictionary for display
        metrics[model_name] = {
            'model':model,
            'accuracy': accuracy,
            'precision': precision,
            'recall': recall,
            'f1_score': fscore,
            'roc_auc': roc_auc
        }
    
for model_name, model_metrics in metrics.items():
    print(f"\n{model_name} Metrics:")
    for metric_name, metric_value in model_metrics.items():
        if metric_name != 'model':
            print(f"{metric_name.capitalize()}: {metric_value:.2f}")
joblib.dump(model, 'random_forest_model.joblib')   
Features to drop: {'ALL_Count', 'ALL_MeanAccountAge', 'ALL_CountDefaultAccounts'}
Index(['UID', 'Amount', 'Term', 'ALL_AgeOfOldestAccount',
       'ALL_AgeOfYoungestAccount', 'ALL_CountActive',
       'ALL_CountClosedLast12Months', 'ALL_CountOpenedLast12Months',
       'ALL_CountSettled', 'ALL_SumCurrentOutstandingBal',
       'ALL_SumCurrentOutstandingBalExcMtg',
       'ALL_WorstPaymentStatusActiveAccounts', 'DebtRatio', 'AccountAgeRatio',
       'LoanPurpose_Car repairs', 'LoanPurpose_Consolidation',
       'LoanPurpose_Debt', 'LoanPurpose_Debt Consolidation',
       'LoanPurpose_Debt consolidation', 'LoanPurpose_Furniture',
       'LoanPurpose_Home Improvements', 'LoanPurpose_Home Improvments',
       'LoanPurpose_Home improvement', 'LoanPurpose_Home improvements',
       'LoanPurpose_New car', 'LoanPurpose_Other',
       'LoanPurpose_Short Term Loan', 'LoanPurpose_car',
       'LoanPurpose_consolidation', 'LoanPurpose_debt consolidation',
       'LoanPurpose_home improvement', 'LoanPurpose_home improvements',
       'LoanPurpose_other', 'EmploymentType_Employed - part time',
       'EmploymentType_Retired', 'EmploymentType_Self employed'],
      dtype='object')
๐Ÿƒ View run Experiment 4 at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0/runs/0d3e92f0efb54336b64105c57fdd0811
๐Ÿงช View experiment at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0
Index(['ALL_WorstPaymentStatusActiveAccounts', 'Amount', 'LoanPurpose_Other',
       'ALL_CountClosedLast12Months', 'ALL_CountOpenedLast12Months', 'Term',
       'ALL_AgeOfYoungestAccount', 'ALL_CountActive',
       'ALL_SumCurrentOutstandingBalExcMtg', 'ALL_AgeOfOldestAccount'],
      dtype='object')
2024/12/21 15:59:51 WARNING mlflow.models.model: Model logged without a signature and input example. Please set `input_example` parameter when logging the model to auto infer the model signature.
๐Ÿƒ View run Experiment Logistic Regression at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0/runs/9992e39a39194c3ea3b77b94002e3263
๐Ÿงช View experiment at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0
2024/12/21 16:00:13 WARNING mlflow.models.model: Model logged without a signature and input example. Please set `input_example` parameter when logging the model to auto infer the model signature.
๐Ÿƒ View run Experiment Random Forest at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0/runs/e6eae9a0d78f4affa4146111b6425c23
๐Ÿงช View experiment at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0
2024/12/21 16:00:32 WARNING mlflow.models.model: Model logged without a signature and input example. Please set `input_example` parameter when logging the model to auto infer the model signature.
๐Ÿƒ View run Experiment Ridge Classifier at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0/runs/1d48d0140b214b798d430bad3b6b6dc2
๐Ÿงช View experiment at: https://dagshub.com/pranjalm04/Machine_learning_project.mlflow/#/experiments/0

Logistic Regression Metrics:
Accuracy: 0.74
Precision: 0.71
Recall: 0.83
F1_score: 0.76
Roc_auc: 0.75

Random Forest Metrics:
Accuracy: 0.94
Precision: 0.94
Recall: 0.93
F1_score: 0.94
Roc_auc: 0.94

Ridge Classifier Metrics:
Accuracy: 0.74
Precision: 0.70
Recall: 0.86
F1_score: 0.77
Roc_auc: 0.74
['random_forest_model.joblib']